representer theorem について

Words near each other

・ "O" Is for Outlaw
・ "O"-Jung.Ban.Hap.
・ "Ode-to-Napoleon" hexachord
・ "Oh Yeah!" Live
・ "Our Contemporary" regional art exhibition (Leningrad, 1975)
・ "P" Is for Peril
・ "Pimpernel" Smith
・ "Polish death camp" controversy
・ "Pro knigi" ("About books")
・ "Prosopa" Greek Television Awards
・ "Pussy Cats" Starring the Walkmen
・ "Q" Is for Quarry
・ "R" Is for Ricochet
・ "R" The King (2016 film)
・ "Rags" Ragland
・ ! (album)
・ ! (disambiguation)
・ !!
・ !!!
・ !!! (album)
・ !!Destroy-Oh-Boy!!
・ !Action Pact!
・ !Arriba! La Pachanga
・ !Hero
・ !Hero (album)
・ !Kung language
・ !Oka Tokat
・ !PAUS3
・ !T.O.O.H.!
・ !Women Art Revolution

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

representer theorem ：ウィキペディア英語版

representer theorem

In statistical learning theory, a representer theorem is any of several related results stating that a minimizer

f^

of a regularized empirical risk function defined over a reproducing kernel Hilbert space can be represented as a finite linear combination of kernel products evaluated on the input points in the training set data.
==Formal Statement==
The following Representer Theorem and its proof are due to Schölkopf, Herbrich, and Smola:
Theorem: Let

\mathcal

be a nonempty set and

k

a positive-definite real-valued kernel on

\mathcal \times \mathcal

with corresponding reproducing kernel Hilbert space

H_k

. Given a training sample

(x_1, y_1), \dotsc, (x_n, y_n) \in \mathcal \times \R

, a strictly monotonically increasing real-valued function

g \colon [0, \infty) \to \R

, and an arbitrary empirical risk function

E \colon (\mathcal \times \R^2)^n \to \R \cup \lbrace \infty \rbrace

, then for any

f^ \in H_k

satisfying
:

f^ = \operatorname_ \left\lbrace E\left( (x_1, y_1, f(x_1)), ..., (x_n, y_n, f(x_n)) \right) + g\left( \lVert f \rVert \right) \right \rbrace, \quad (*)

f^

admits a representation of the form:
:

f^(\cdot) = \sum_^n \alpha_i k(\cdot, x_i),

where

\alpha_i \in \R

for all

1 \le i \le n

.
Proof:
Define a mapping
:

\begin \varphi \colon \mathcal &\to \R^

(so that

\varphi(x) = k(\cdot, x)

is itself a map

\mathcal \to \R

). Since

k

is reproducing kernel, then
:

\varphi(x)(x') = k(x', x) = \langle \varphi(x'), \varphi(x) \rangle,

where

\langle \cdot, \cdot \rangle

is the inner product on

H_k

.
Given any

x_1, ..., x_n

, one can use orthogonal projection to decompose any

f \in H_k

into a sum of two function, one lying in

\operatorname \left \lbrace \varphi(x_1), ..., \varphi(x_n) \right \rbrace

, and the other lying in the orthogonal complement:
:

f = \sum_^n \alpha_i \varphi(x_i) + v,

where

\langle v, \varphi(x_i) \rangle = 0

for all

i

.
The above orthogonal decomposition and the reproducing property together show that applying

f

to any training point

x_j

produces
:

f(x_j) = \left \langle \sum_^n \alpha_i \varphi(x_i) + v, \varphi(x_j) \right \rangle = \sum_^n \alpha_i \langle \varphi(x_i), \varphi(x_j) \rangle,

which we observe is independent of

v

. Consequently, the value of the empirical risk

E

in (
*) is likewise independent of

v

. For the second term (the regularization term), since

v

is orthogonal to

\sum_^n \alpha_i \varphi(x_i)

and

g

is strictly monotonic, we have
:

\begin g\left( \lVert f \rVert \right) &= g \left(  \lVert \sum_^n \alpha_i \varphi(x_i) + v \rVert \right) \\&= g \left( \sqrt \right) \\&\ge g \left(  \lVert \sum_^n \alpha_i \varphi(x_i) \rVert \right).\end

Therefore setting

v = 0

does not affect the first term of (
*), while it strictly decreasing the second term. Consequently, any minimizer

f^

in (
*) must have

v = 0

, i.e., it must be of the form
:

f^(\cdot) = \sum_^n \alpha_i \varphi(x_i) = \sum_^n \alpha_i k(\cdot, x_i),

which is the desired result.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「representer theorem」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース